CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
نویسندگان
چکیده
MOTIVATION The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. RESULTS We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. AVAILABILITY CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Modulation of CRISPR locus transcription by the repeat-binding protein Cbp1 in Sulfolobus
CRISPR loci are essential components of the adaptive immune system of archaea and bacteria. They consist of long arrays of repeats separated by DNA spacers encoding guide RNAs (crRNA), which target foreign genetic elements. Cbp1 (CRISPR DNA repeat binding protein) binds specifically to the multiple direct repeats of CRISPR loci of members of the acidothermophilic, crenarchaeal order Sulfolobale...
متن کاملAccurate computational prediction of the transcribed strand of CRISPR non-coding RNAs
MOTIVATION CRISPR RNAs (crRNAs) are a type of small non-coding RNA that form a key part of an acquired immune system in prokaryotes. Specific prediction methods find crRNA-encoding loci in nearly half of sequenced bacterial, and three quarters of archaeal, species. These Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) arrays consist of repeat elements alternating with specifi...
متن کاملCRISPR-Cas: the effective immune systems in the prokaryotes
Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separa...
متن کاملCharacterization of CRISPR RNA transcription by exploiting stranded metatranscriptomic data.
CRISPR-Cas systems are bacterial adaptive immune systems, each typically composed of a locus of cas genes and a CRISPR array of spacers flanked by repeats. Processed transcripts of CRISPR arrays (crRNAs) play important roles in the interference process mediated by these systems, guiding targeted immunity. Here we developed computational approaches that allow us to characterize the expression of...
متن کاملBiogenesis pathways of RNA guides in archaeal and bacterial CRISPR-Cas adaptive immunity.
CRISPR-Cas is an RNA-mediated adaptive immune system that defends bacteria and archaea against mobile genetic elements. Short mature CRISPR RNAs (crRNAs) are key elements in the interference step of the immune pathway. A CRISPR array composed of a series of repeats interspaced by spacer sequences acquired from invading mobile genomes is transcribed as a precursor crRNA (pre-crRNA) molecule. Thi...
متن کامل